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TITLE OF THE INVEKTION 

INSTRUCTION SETS FOR PROCESSORS 
BACKGROUND OF THE INVENTIOIsr 
Field of the Invention 

5 The present invention relates to instruction sets 

for processors. In particular, the present invention 
relates to processors having two or more different 
instruction sets. The present invention also relates 
to methods of automatically encoding instructions for 

10 such processors. 

Description of the Related Art 

A high-performance processor is generally required 
to have an instruction set which can meet two 
requirements: compact code (so that the amount of 

15 memory required to store the processor's program is 

desirably small) , and a rich set of operations and 
operands. Such requirements are particularly important 
in the case of an embedded processor, i.e. a processor 
embedded in a system such as in a mobile communications 

20 device. In this case, high code or instruction density 

is of critical importance because of the limited 
resources of the system, for example in terms of 
available program memory. 

However, these two requirements tend to conflict 

25 with one another and are difficult to achieve in a 

single unified instruction set, as compact code 
involves a minimal encoding for each of the most 
frequent operations (eliminating the less frequent 
operations from the instruction set) whereas a rich set 

3 0 of operations and operands requires an orthogonal 32- 

bit reduced instruction set. Consequently, in a 
processor having a pre-existing 3 2 -bit instruction set 
it has been proposed to add a compact 16 -bit 
instruction set which provides the most commonly-used 

35 functions and/or access to a limited subset of register 

operands . 
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Fig. 1 of the accompanying drawings shows 
schematically the instruction sets in such a processor. 
Internally, at the hardware level, the processor has a 
set of 32-bit instructions ISx^t- Externally, the 
5 processor has two instruction sets ISi and IS2. The 

first instruction set ISi is made up of the same 32 -bit 
instructions as the internal instruction set ISint- The 
second instruction IS2 is made up of IS -bit instructions 
and the processor contains instruction translation 
10 circuitry 200 for translating each 16-bit instruction 

of the external instruction set IS2 into a corresponding 
one of the 32 -bit instructions of the internal 
instruction set ISj^t- 

An embedded processor may also be a very long 
15 instruction word (VLIW) processor capable of executing 

VLIW instructions. The most important additional 
feature of a VLIW processor is Instruction-Level 
Parallelism (ISP), i.e. its ability to issue two or 
more operations simultaneously when executing VLIW 
20 instructions. t 

In such a VLIW processor an instruction issuing 
unit has a plurality of issue slots, each connected' 
operatively to a different execution unit. It is 
typical for a VLIW processor that issues two or more 
25 instructions per processing cycle to encode each 

instruction in a different format (or group of formats) 
depending on the issue slot from which the instruction 
will be issued. The instructions that will be issued 
in the same processing cycle are combined together in a 
3 0 VLIW packet or parcel. The position of an instruction 

in the VLIW parcel determines the sub-set of formats in 
which that instruction may be encoded. In this way, 
formats for instructions destined for different 
positions within the VLIW parcel can use identical 
3 5 encodings without introducing ambiguity. 

In practice, empirical observation suggests that 



90% or more of the instructions within a program are 
executed so infrequently that they make up 10% or less 
of the execution time. Naturally, the remaining 10% of 
the instructions occupy 90% of the execution time. 
Furthermore, it is often the case that the 
infrequently-executed parts of a program will not be 
able to make effective use of the processor's ability 
to issue two or more instructions simultaneously. If 
such parts of the program were encoded using a VLIW 
instruction set, a large proportion of the instructions 
would be "no operation" (NOP) instructions inserted in 
the program by the compiler simply to pad out the VLIW 
parcels when consecutive instructions cannot appear in 
the same VLIW parcel because the result of one 
instruction is used by the next. It follows that, for 
parts of a program where no effective advantage can be 
taken of the ability to issue instructions in parallel, 
or where any performance gain from that ability will 
have little impact anyway, it is desirable to encode 
the program to achieve maximum code density (i.e. using 
the smallest possible number of bits) . 

Accordingly, it is desirable to provide a VLIW 
processor with a compact -format instruction set, so as 
to combine the instruction-level parallelism of VLIW 
architecture with the compact code "footprint" of a 
tightly-encoded instruction set such as a 16 -bit 
instruction set. 

In the previously-proposed processor discussed 
above with reference to Fig. 1, the compact instruction 
set was added after the design of an original 3 2 -bit 
instruction set, with the result that the translation 
from the 16-bit instructions into 32 -bit instructions 
is undesirably complex and slow. 

It is therefore also desirable to design the 
instruction- set formats and encodings in such a way 
that the translation from each external instruction 



format (e,g. at least one VLIW format, and at least one 
compact format) into a form that can be executed 
directly by hardware, can be achieved more efficiently. 
SXTMMARY OF THE INVENTION 

A processor embodying a first aspect of the 
present invention has "congruent" instruction 
encodings. In the simplest case this means that the 
processor has respective first and second external 
instruction formats in which instructions are received 
by the processor. Each instruction has an opcode which 
specifies an operation to be executed, and each 
external format has one or more preselected opcode bits 
in which the opcode appears. The processor also has an 
internal instruction format into which instructions in 
the external formats are translated prior to execution 
of the operations. The operations include a first 
operation specifiable in both the first and second 
external formats, and a second operation specifiable in 
the second external format . The first and second 
operations have distinct opcodes in the second external 
format. In each preselected opcode bit which the first 
and second external formats have in common, the opcodes 
of the first operation in the two external formats are 
identical . 

In a second aspect of the present invention there 
are provided congruent processor instruction encodings. 
The encodings have, in the simplest case, respective 
first and second external instruction formats in which 
the instructions are received by a processor. Each 
instruction has an opcode which specifies an operation 
to be executed, and each external format has one or 
more preselected opcode bits in which the opcode 
appears. The processor instructions in the external 
formats are translated into an internal instruction 
format prior to execution of the operations. A first 
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operation executable by the processor is specifiable in 
both the first and second external formats, and a 
second operation executable by the processor is 
specifiable in the second external format. The first 
5 and second operations have distinct opcodes in the 

second external format. In each preselected opcode bit 
which the first and second external formats have in 
common, the opcodes of the first operation in the two 
external formats are identical. 

10 Such "congruent" instruction encodings can enable 

a translation process, for translating the external- 
format opcode into a corresponding internal -format 
opcode, to be carried out simply and quickly without 
the need to positively identify each individual 

15 external -format opcode. 

According to a third aspect of the present 
invention there is provided a method of producing 
congruent processor instruction encodings as set out 
above. The method comprises: encoding the first and 

2 0 second operations with distinct opcodes in the second 

external format; and encoding the opcodes of the first 
operation in the first and second external formats so 
that, in each preselected opcode bit which the first 
and second external formats have in common, the opcodes 

2 5 of the first operation in the two external formats are 

identical . 

According to a fourth aspect of the present 
invention there is provided a method of encoding 
instructions for a processor having two or more 

3 0 external instruction formats and one or more internal 

instruction formats. The method comprises: a) 
selecting initial encoding parameters including a 
number of effective opcode bits in each external and 
internal format and a set of mapping functions. Each 
3 5 mapping function serves to translate an opcode 

specified by the opcode bits in one of the external 



formats to an opcode specified by the opcode bits in 
the, or in one of the, internal formats; 
(b) allocating each operation executable by the 
processor an opcode distinct from that allocated to 
5 each other operation in each external and internal 

format in which the operation is specifiable. The 
allocated opcodes are such that each relevant mapping 
function translates such an external -format opcode 
allocated to the operation into such an internal -format 

10 opcode allocated to the operation and such that all the 

internal -format opcodes allocated to the operation have 
the same effective opcode bits; and c) if in step (b) 
no opcode is available for allocation in each 
specifiable format for every one of the said 

15 operations, determining which of the said encoding 

parameters is constraining the allocation in step (b) , 
relaxing the constraining parameter, and then repeating 
step (b) . 

According to a fifth aspect of the present 

2 0 invention there is provided a computer-readable 

recording medium storing computer program which, when 
executed, encodes instructions for a processor having 
two or more external instruction formats and one or 
more internal instruction formats. The program 
25 comprises a selecting code portion which selects 

initial encoding parameters including a number of 
effective opcode bits in each external and internal 
format and a set of mapping functions. Each mapping 
function serves to translate an opcode specified by the 

3 0 said opcode bits in one of the external formats to an 

opcode specified by the said opcode bits in the, or in 
one of the, internal formats. An allocating code 
portion allocates each operation executable by the 
processor an opcode distinct from that allocated to 
35 each other operation in each external and internal 

format in which the operation is specifiable. The 
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allocated opcodes are such that each relevant mapping 
function translates such an external -format opcode 
allocated to the operation into such an internal -format 
opcode allocated to the operation and such that all the 
5 internal -format opcodes allocated to the operation have 

the same effective opcode bits. If the allocating code 
portion finds that no opcode is available for 
allocation in each specifiable format for every one of 
the said operations, a determining code portion 
10 determines which of the encoding parameters is 

constraining the allocation by the allocating code 
portion, relaxes the constraining parameter, and then 
the allocating code portion repeats its allocation 
operation . 

15 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1, discussed hereinbefore, is a schematic 
diagram for use in explaining a previously-proposed 
processor having an additional compact instruction set; 

Fig. 2 shows parts of a processor embodying the 

2 0 present invention; 

Fig. 3(A) shows a schematic diagram for use in 
explaining previously-considered instruction encodings; 

Fig. 3(B) shows a schematic diagram corresponding 
to Fig. 3(A) for use in explaining congruent 
25 instruction encodings; 

Figs. 4(A) and 4(B) present a flowchart for use in 
explaining a method of encoding instructions embodying 
the present invention; 

Fig. 5 shows a schematic view of external and 

3 0 internal instruction formats in a specific example; 

Fig. 6 presents a table illustrating which 
operations are specifiable in each external and 
internal format in the Fig, 5 specific example; 

Figs. 7(A) to 7(H) present schematic diagrams for 
3 5 use in explaining different stages of an automatic 

encoding method applied to the Fig. 5 specific example; 
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and 

Fig. 8 shows the final instruction encodings 
achieved by the method of Fig. 7. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

5 Fig. 2 shows parts of a processor embodying the 

present invention. In this example, the processor is a 
very long instruction word (VLIW) processor. The 
processor 1 includes an instruction issuing unit 10, a 
schedule storage unit 12, respective first, second and 

10 third VLIW translation units 4, 6 and 8, a scalar 

translation unit 9, respective first, second and third 
execution units 14, 16 and 18, and a register file 20. 

The instruction issuing unit 10 has three issue 
slots ISl, IS2 and IS3 connected respectively to the 

15 first, second and third translation units 4, 6 and 8. 

Respective outputs of the first, second and third 
translation units 4, G and 8 are connected to 
respective first inputs of the first, second and third 
execution units 14, 16 and 18 respectively. 

20 The instruction issuing unit 10 has a further 

output SC connected to the scalar translation unit 9 . 
An output of the scalar translation unit 9 is connected 
in common to a second input of each execution unit 14, 
16 and 18. 

25 A first bus 22 connects all three execution unit 

14, 16 and 18 to the register file 20. A second bus 24 
connects the first and second units 14 and 16 (but not 
the third execution unit 18 in this embodiment) to a 
memory 26 which, in this example, is an external random 

30 access memory (RAM) device. The memory 26 could 

alternatively be a RAM internal to the processor 1 . 

Incidentally, although Fig. 1 shows shared buses 
22 and 24 connecting the execution units to the 
register file 20 and memory 26, it will be appreciated 

3 5 that alternatively each execution unit could have its 

own independent connection to the register file and 
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memory . 

The processor 1 performs a series of processing 
cycles. The processor may operate selectively in two 
modes: a scalar mode and a VLIW mode. 
5 In scalar mode the processor executes instructions 

from a particular instruction set (which may or may not 
be distinct from the VLIW instruction set) . In this 
mode instructions are not issued at the issue slots ISl 
to IS3. 

10 In VLIW mode, on the other hand, the instruction 

issuing unit 10 can issue up to 3 instructions in 
parallel per cycle at the 3 issue slots ISl to IS3 , 
i.e. the full instruction issue width is exploited. 
Scalar-mode instructions and VLIW-mode 

15 instructions are both stored together in the schedule 

storage unit 12, The instructions are issued according 
to an instruction schedule stored in the schedule 
storage unit . 

As explained later in more detail, instructions in 

2 0 the instruction schedule are written in at least two 

different external formats, including at least one 
format belonging to a scalar instruction set of the 
processor (hereinafter a "scalar format") and at least 
one format belonging to a VLIW instruction set of the 

25 processor (hereinafter a "VLIW format"). In practice, 

there may be two or more scalar formats and two or more 
VLIW formats. In the case of the VLIW formats it is 
possible to have different formats for different issue 
slots, although a format may be shared by two or more 

30 issue slots. 

On the other hand, within the processor each 
execution unit executes instructions in at least one 
internal format. Accordingly, each execution unit 14, 
16 and 18 is provided with a translation unit 4, 6 or 8 

3 5 which translates an instruction in one of the external 

VLIW formats into the (or, if more than one, the 
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appropriate) internal format required by the execution 
unit concerned. Similarly, the scalar translation unit 
9 is provided for translating an instruction in one of 
the external scalar formats into the (appropriate) 
5 internal format required by the execution units. 

After translation by the relevant translation unit 
4, 6, 8 or 9 the instructions issued by the instructing 
issuing unit 10 at the different issue slots or at the 
scalar instruction output SC are executed by the 

10 corresponding execution units 14, 16 and 18. Each of 

the execution units may be designed to execute more 
than one instruction at the same time, so that 
execution of a new instruction can be initiated prior 
to completion of execution of a previous instruction 

15 issued to the execution unit concerned. 

To execute instructions, each execution unit 14, 
16 and 18 has access to the register file 2 0 via the 
first bus 22 . Values held in registers contained in 
the register file 2 0 can therefore be read and written 

2 0 by the execution units 14, 16 and 18. Also, the first 

and second execution units 14 and 16 have access via 
the second bus 24 to the external memory 2 6 so as to 
enable values stored in memory locations of the 
external memory 2 6 to be read and written as well. The 

2 5 third execution unit 18 does not have access to the 

external memory 2 6 and so can only manipulate values 
contained in the register file 20 in this embodiment. 

As outlined above, the architecture of the Fig. 2 
processor defines a compact (e.g. 16-bit) instruction 
30 set and a wider (e.g. 32-bit) VLIW instruction set. 

There are at least two of these wider instructions in 
each VLIW parcel. Instructions belonging to the 
compact instruction set and the VLIW instruction set 
are encoded using external formats. 

3 5 There is also at least one internal instruction 

format to which all instructions in an external format 
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are translated during execution. 

Each VLIW parcel is made up of two or more 
instructions at different positions (slots) within the 
parcel • Each slot within a VLIW parcel may contain an 
instruction encoded in one of several external VLIW 
formats. At least some fundamental operations provided 
by the processor (e.g. add, subtract or multiply) may 
need to be available in two or more, or possibly all, 
of the instruction slots of a VLIW parcel. In this 
case, the same fundamental operation may be encoded in 
a different external format per instruction slot. Of 
course, when the instructions in these different 
external formats are translated they must all have the 
same operation code (opcode) within the same group of 
bits in the or each internal format . 

A fundamental operation may also need to be 
available using two or more scalar instructions, for 
example where the same fundamental operation is 
performed using two or more different types of operand 
or operand addressing. In this case, each of the two 
or more scalar instructions relating to the same 
fundamental operation must be encoded using a different 
scalar format and must translate to a different 
internal format. Again, when translated into an 
internal format, these two or more scalar instructions 
must have the same opcode as all VLIW- format 
instructions for the same operation which translate to 
the same internal format. Typically, the scalar 
instruction set will be a sub- set of the full (VLIW) 
instruction set, allowing a more compact encoding of 
the external scalar formats . 

The task of designing formats and assigning codes 
to each operation in each format is complicated by the 
fact an operation X may appear in external formats 
and F2, whereas another operation Y may appear in the 
external format F2 and in a further external format F3 . 
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This means that the design of the external formats F^, 
F2 and F3, and the choice of opcodes for operations X 
and Y, are interdependent. Fig. 3(A) shows a simple 
example of previously-considered instruction encodings. 
In this example, an add operation appears both in 
external formats F^ and F2. The add operation in both 
formats Fi and F2 is mapped to the same internal format 
Gi. A load instruction appears in the external format 
F2 and in the further external format F3. The load 
operation in both formats is translated into the same 
internal format G2 . 

As shown in Fig. 3(A), in the different external 
formats F^ to F3, different sets of bits are used for 
specifying the opcode, i.e. the opcode fields are 
different. In the format Fi the four bits from bit i+1 
to bit i+4 are used to specify the opcode. In format 
F2/ the three bits from bit i+1 to bit i+3 are used to 
specify the opcode. In format F3, the four bits from 
bit i to i+3 are used to specify the opcode. The 
opcode field for F2 may be shorter than for Fi and F3 
because there are less operations available in F2, for 
example . 

In Fig. 3(A) the external formats F^ and F2 have 
the bits i+1 to i+3 in common as opcode bits. For the 
add operation in format Fi and the load operation in 
format F2 these common bits i+1 to i+3 are the same, 
even though the operations are different. This 
complicates the translation process. For example, in 
internal format Gi the add operation may have the opcode 
"1011". The add operation in format F2 can be 
translated into this internal -format opcode simply by 
selecting "101" from F2 and appending a "1". However^ 
to translate the add operation in format Fi into this 
internal -format code it is not possible to use a simple 
selection operation. In this case it may be necessary 
to examine all opcode bits i+1 to i+4 in the external 
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format Fi and match uniquely the pattern of bits 
("1101") which identifies the add operation in format 
Fi . Anything short of this full examination might not 
distinguish it from another operation in F^ . 
5 However, if it could be guaranteed that: 

(i) the opcodes for "add" and "load" in format F2 
are distinct, and the same is true for any other pair 
of operations which appear together in the same format 
F2 as well as in at least one other format; and 

10 (ii) every operation that appears in two or more 

external formats (i.e. the "add" operation and any 
other which appears in Fi and F2, and the "load" 
operation and any other which appears in F2 and F3) is 
identically coded in all common opcode bits in all 

15 those formats in which it appears; 

then the translation process can be independent of 
the opcodes themselves and can rely only on discovering 
the external format (and, if there is more than one 
internal format, the target internal format) of each 

2 0 instruction. Instruction encodings which have this 

property are referred to herein as "congruent" 
instruction encodings . 

In Fig. 3(B) the add and load operations of Fig. 
3 (A) have been allocated congruent instruction 
25 encodings. It can be observed that the opcodes 

assigned to the add instruction ("1011" in format Fi and 
"101" in format F2) are identical in the three opcode 
bits that are in common for the two formats Fi and F2 
("101") . 

3 0 Similarly, in the case of the load operation 

appearing in formats F2 and F3, the three opcode bits 
that are in common for formats F2 and F3 are identical 
("Oil") in F2 and F3. 

Thus, the instruction encodings in Fig. 3 are 
35 congruent. This means that the translation operation 

performed by the translation unit can be a simple bit- 
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selection operation, for example to select some or all 
of the bits from i+1 to i+4 in the case of translation 
from external format to internal format Gi, selecting 
some or all of the three bits from i+1 to i+3 in the 
5 case of translation from external format F2 to either 

internal format Gi or G2, and selecting some or all of 
the four bits from i to i+3 when translating from 
external format F3 to internal format G2. The 
particular selection of bits required for a given 

10 translation can then be determined simply by 

identifying the external format and target internal 
format. The identification of the external format can 
be made by examining ID bits in the external formats, 
for example the bits labelled F^ to F3 in Fig. 3 (B) . 

15 The task of designing instruction formats and 

opcodes having the property of congruence is not 
difficult in the simple case illustrated in Fig. 3 (B) 
in which only two operations are considered. However, 
when there are many operations in different external 

20 formats which also appear in different internal fo^rmats 

the task of designing formats and assigning opcodes 
becomes very difficult. For example, a processor may 
have approximately 3 2 to 12 8 instructions in its scalar 
instruction set, 32 to 128 (or possibly double that) 

2 5 instructions in its VLIW instruction set, and perhaps 3 

to 6 different external formats and 4 to 6 different 
internal formats. 

This has meant that heretofore the translation 
units used to carry out the translations have been 

3 0 undesirably complex, leading to propagation delays and 

excessive power consumption in previously-considered 
processors , 

Next, a method will be described for designing 
automatically formats, opcodes and translations for 
3 5 achieving congruent instruction encodings. 

In order to describe this method for determining 
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opcode fields within instruction formats and deriving 
congruent encodings in those formats let us begin by- 
defining the terms we shall use. 
Let j\i 

5 w= u Gf 

be the set of all internal instructions, encoded in N 
internal formats G^, 

10 Each internal format Gj is a proper subset of W, 

and comprises a set of internal instructions defined by 
the processor that is being implemented. If y is an 
instruction encoded in format Gj, then the opcode for y 
is given by function gj (y) which selects a sub-field 

15 containing aj bits from the instruction format Gj, 

Let denote an external instruction format, where 
i e [1, M] . If is an instruction encoded in format 
Fj^, then the code for x is given by the function f^(:H:) 
which selects a sub-field containing jb^ bits from the 

2 0 instruction F^, 

Each internal instruction is represented in memory 
by one or more external instruction formats. Where an 
instruction is represented in two or more external 
formats, each variant must translate to the same 
25 internal opcode. These variants typically perform the 

same function, though the types and representation of 
their operands may differ. 

The present explanation is concerned with the 
process by which opcode field widths are determined, 

3 0 and the process by which operation codes are assigned 

in each format . The encoding of operands is also 
important, but is independent of the issue of opcode 
assignment and is therefore not addressed here. 

A translation from external format to internal 
3 5 format requires a mapping function m^^j which maps the 

bits of opcode from F^ to the a.^ bits of opcode in Gj. 
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For the purposes of simplicity in implementation and 
tractability in design the mappings are preferably bit 
selections or permutations. In this explanation it 
will also be assumed that there is only one mapping 
5 function for translating between any pair of external 

and internal formats. 

The instruction set architecture of the processor 
defines for each internal instruction y an associated 
set of translations, Ty, where each translation is a 
10 pair (i, j) identifying an external format as the source 



of the translation and an internal format as the 
destination of the translation. For each translation 
there must exist a mapping function m^^j. Hence: 



15 



(eq 1) 



20 



Each format, whether internal or external, has a 
cardinality determined by the number of opcodes within 
the format. The cardinality of is written If^I, and 
hence the sizes of the opcode fields in external and 
internal formats must satisfy the following 
inequalities : 



25 





30 



(eq 2) 



Each internal format Gj therefore defines opcodes 



in the range |0,2^^), and each external format defines 



35 



opcodes in the range 1 0,2^0. At any point during the 
method Qj contains the set of opcodes available to be 
allocated to operations in internal format G^. 
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Similarly, i?^ contains the set of opcodes available to 
be allocated to operations in external format F^. 

The problem now consists of determining an unique 
opcode for each instruction y e N, and determining 
5 suitable selection- or permutation-based mapping 

functions for each translation defined in the 
instruction set architecture. One preferred embodiment 
of the method can now be expressed in pseudo-code, 
using the terminology introduced above, as shown in the 

10 flowchart of Figs. 4(A) and 4(B). 

Each mapping function m^^j initially maps a chosen 
number bi of effective opcode bits of the external 
format to a chosen number aj of effective opcode bits 
of the internal format Gj . This can map no more than q 

15 = min (a.j^ bj^) bits from external format to aj bits in 

internal format Gj , setting any undefined bits in a^ to 
zero. For simplicity, it will be assumed in this 
preferred embodiment that each mapping function 
involves selecting all bits of the external -format 

20 opcode to be some or all of the bits of the internal- 

format opcode after translation. Other mapping 
functions can be used in other embodiments of the 
invention, for example mapping functions involving 
permutations. 

25 The method begins in step SI by first computing 

the minimum possible number aj or b^ of opcode bits that 
could theoretically encode the number of instructions 
in each external format and each internal format . This 
minimum possible number a^ or bi is used as an initial 

3 0 number of effective opcode bits for the format 

concerned. 

In step S2 , a new series of iterations is started 
(as explained later, several series may be required in 
a practical situation) . Firstly, for each internal 
35 format Gj , a set Qj of available opcodes is formed, made 

up of all possible opcodes definable by the aj bits. 
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Similarly, for each external format Fi, a set Ri of 
available opcodes is assigned, made up of all possible 
opcodes definable by the bi bits. As explained later, 
each available opcode may have a working number of bits 
5 greater than the computed minimum possible number aj or 

bi of opcode bits. For example, the working number for 
all available opcodes in all sets Qj and may be set 
equal to the highest computed minimum possible number a^ 
or bi . 

10 Step S3 involves iterating through all operations 

in the internal formats and determining their opcodes 
in each external format where they occur. 

During each series of iterations, steps S4 to S9 
are performed per iteration. One fundamental operation 

15 is considered per iteration. In step S4, for the 

considered operation, the method examines the pair of 
sets Ri and Qj for the external format and internal 
format of each mapping function needed to translate the 
considered operation, and identifies as a mutual set ht 

2 0 any members the two sets of the pair have in common. 

In step S5 a set H of common members of all the mutual 
sets ht for all the needed mapping functions is formed. 
If the result is an empty set in step 36, then no 
allowable mapping is found and the method goes to step 

25 Sll where the constraints are relaxed. If H contains 

at least one common opcode, step 87 selects the or one 
of the common opcodes in H. 

Then in step S8 the selected opcode is removed 
from each set Ri and Qj for the external and internal 

30 formats in which the considered operation appears, i.e. 

the sets examined in step S4 . 

The method terminates when it is determined in 
step S9 that the method has successfully allocated 
opcodes to all operations in all the required external 

35 and internal formats. 

The method is guaranteed to terminate because the 
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back-tracking process in step Sll successively relaxes 
the encoding constraints until there are as many opcode 
bits as are needed to find a congruent assignment of 
codes . 

5 In addition to selecting bits from the external 

format F^, the mapping function may also permute the 
bits. For example, the order of the bits may be 
reversed by the mapping function. Such permutations 
can be used when the number of mapped bits reaches q, 

10 where g=min (aj, bi) . 

If p = max(a^, b^) , then the total number of 
possible permutations is p^- / {p-g) i* Hence, for large 
instruction sets, the number of possible permutations 
could be very large. In practice, however, it is 

15 typical for p to be about 5 and q to be about 3 . This 

means a maximum of SO different permutation functions 
for each mapping. Typically one might expect there to 
be five different mappings, leading to a total of 6 0^ 
possible sets of mapping functions to consider on each 

20 iteration of the method defined by steps S4 to S9 (i.e. 

778 million possibilities) . This is within the 
capabilities of a modern computer to enumerate and 
evaluate automatically . 

For larger field widths the number of possible 

25 permutations grows intractably large. However, it is 

still possible to operate the method successfully in 
this case by restricting the class of permutations that 
will be searched. For example, there are nin+l) /2 
possible permutations of n-bit field defined by 

30 swapping arbitrary pairs of bits. By choosing such a 

restriction on the possible permutations to be examined 
by the method the running time of the method could be 
constrained to be polynomial in n. 

Next, operation of the method described with 

35 reference to Figs. 4(A) and 4(B) will be illustrated 

with reference to a specific example. In this example. 
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a VLIW processor, for example a processor generally in 
accordance with Fig. 2, has the capability to issue two 
instructions simultaneously from issue slots A and B 
respectively . 

5 Referring to Fig. 5, it can be seen that the 

external VLIW formats allowed for instructions to be 
issued from issue slot A include first and second 
external VLIW formats Fi and F2 . The opcode bits in 
external format F^^ are denoted by Ci in Fig. 5, and the 

10 opcode bits in format F2 are denoted by C2 . 

In the case of instructions to be issued from 
issue slot B, two external VLIW formats are also 
available, one of them is the same external format F2 as 
available at issue slot A, and the other is a third 

15 external VLIW format F3 . The opcode bits in format F3 

are denoted by C3 in Fig. 5. 

In addition, the processor in this example is 
capable of operating in a scalar mode to execute 
instructions in one of two different 16-bit scalar 

2 0 external formats F^ and F5, The opcode bits in format 

F4 are denoted by C4 in Fig. 5, and the opcode bits in 
format F5 are denoted by C5 . 

The processor in this example also has two 
internal formats Gi and G2 . The opcode bits in the 
25 internal format Gi are denoted by in Fig, 5, and the 

opcode bits in internal format G2 are denoted by Cg . 
Each scalar instruction translates into a single 
operation in one or both of the internal formats Gi and 
G2, encoded in either the or field. 

3 0 As also shown schematically in Fig, 5, the 

processor has three translation units, 30, 32 and 34. 
The translation unit 3 0 corresponds to issue slot A and 
is operable to translate opcode bits Ci in external 
format Fi or opcode bits C2 in external format F2 into 
35 either opcode bits in internal format Gi or opcode 

bits Cb in internal format G2 . 
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Similarly, the translation unit 3 2 corresponds to 
issue slot B and is operable to translate opcode bits C2 
in external format F2 or opcode bits C3 in external 
format F3 into opcode bits in internal format or 
5 opcode bits Cg in internal format G2 . 

The translation unit 34 corresponds to the scalar 
instructions and is operable to translate either opcode 
bits C4 in external format F4 or opcode bits C5 in 
external format F5 into opcode bits in internal 
10 format Gi or opcode bits Cb in internal format G2 . 

It will be appreciated that the translation units 
3 0 and 3 2 in Fig. 5 correspond to the translation units 
4, 6 and 8 in Fig. 2, and that the translation unit 34 
in Fig. 5 corresponds to the translation unit 9 in Fig. 
15 2 . 

Referring now to Fig. 6, the processor in the 
present example has a small set of seven fundamental 
operations: an addition operation add, a logical OR 
operation or, a multiply operation mul, a load 

2 0 immediate operation li, a subtraction operation sub, a 

return from VLIW-mode operation rv and a division 
operation div. The table presented in Fig. 6 lists 
these seven fundamental operations in the first (left- 
hand) column. The second column in Fig. 6 indicates in 
25 which internal formats the operation concerned is 

permitted to appear. The add, or, mul, li and sub 
instructions are permitted to appear in both internal 
formats G^ and G2 and so have "Gl" and "G2" rows, but 
the rv and div instructions are only permitted to 

3 0 appear in internal format G2 and so have no "Gl" row. 

The remaining six columns in Fig. 6 relate to the 
five external instruction formats F^ to F5. The 
external format F2 has two columns allocated to it in 
this case, as this format is allowed at both issue slot 
3 5 A and issue slot B. 

Each cell in one of the six external -format 



columns corresponds to an instruction. Some of the 
cells are shaded whilst others are blank. An 
instruction I in a cell at row Gj and must be 
represented in external format and must be translated 
to internal format Gj if its cell is shaded. If the 
cell is not shaded then the instruction I concerned is 
not present in external format F^. Take, for example, 
the cell denoted by an asterisk in Fig, 6. This cell 
is at row Gi for the or instruction, and at column F^ . 
The shading of the cell indicates that the or 
instruction is present in external format F^ and 
internal format Gi, requiring that opcodes for the or 
operation are appropriately chosen in both formats and 
that a translation exists for the or instruction 
between these two formats. 

The algorithm described previously with reference 
to Figs, 4(A) and 4(B) will now be applied to the 
present example of Figs. 5 and 6 to determine the 
opcodes, the opcode field widths in each format, and 
the mapping functions (translations) between formats. 

The set W of fundamental operations in this 
example can be written as : 

W= {add,or,mulJi,sub,rv,div] 

... (eq 3) 

The number N of internal formats is 2 (Gi and G2) , 
and the number M of external formats is 5 (F^ to F5) . 

Looking at Fig. 6, for each external format F^ a 
mapping function m^^j is required if, for any operation, 
there is a shaded cell in row Gj . For example, taking 
the external format F^, it can be seen that a mapping 
function is required for internal format Gi but not for 
internal format G2, as no cell in the F^ column is 
shaded in a G2 row. 

Thus, the following mapping functions are required 



in the present example: mi^i, "^Zrir ^2,2 r ^3,2/ ^a, 
and ni5^ 2 • 

The translation pairs t for each operation, which 
are derived directly from Fig. 6, are as follows: 

'Todd = {{U),{2,1), (2,2),(3,2), (4,1), (4,2), (5,1), (5,2)}" 
ro. = {(l,l),(2,l),(2,2),(3,2),(4,l),(4,2),(5,l),(5,2)} 
r„„, = {(U),(2,l),(2,2),(3,2),(4,l),(4,2),(5,l),(5,2)} 
r„ = {(2,l),(2,2),(3,2),(4,l),(4,2)} 
r.„i = {(l,l),(3,2),(5,l),(5,2)} 

r„={(3.2)} 

.r<*v={(3,2),(5,2)} 

... (eq 4) 

In step SI of the algorithm (Fig. 4(A)) the number 
of opcodes required in each format is determined. For 
each external format this is' determined by observing 
the number of operations for which there is at least 
one shaded cell in the column for that external format. 
For example, in the case of the external format Fi it 
can be seen that four operations {add, or, mul and sub) 
have a shaded cell in the column concerned. Where an 
external format has two columns (such as the external 
format F2) an operation is only counted once even if it 
appears in one internal format in one column and 
internal format in another column. Thus, in the case 
of the external format F2, the number of operations IF2I 
is 4 . 

In the case of an internal format the number of 
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opcodes required is calculated by counting the total 
number of rows (containing at least one shaded cell) 
allocated to the internal format concerned. For 
example, the internal format has five rows with 
5 shaded cells. The internal format G2 has seven rows 

with shaded cells. 

Thus, the numbers of opcodes required in the 
different internal and external formats are: |Gi|=5, 
iGshV, |Fi|=4, |F2h4, |F3|=6. |F4|=4 and |F5|=5. 

10 As a result, in step SI, the initial numbers of 

effective opcode bits are determined as ai=3 , a2=3 , 
bi=2, b2=2, b3=3, b4=2 and b5=3 . These numbers represent 
the minimum possible numbers of bits that could 
theoretically encode the number of operations appearing 

15 in the format concerned, and may have to be increased 

in the course of execution of the algorithm. 

In step 32, a set of available opcodes is created 
for each external format and for each internal format, 
as shown in equation 5. 

20 





{000, 


001, 


010, 


011} 










R2 = 


{000, 


001, 


010, 


011} 










R3 = 


{000, 


001, 


010, 


oil. 


100, 


101, 


110, 


111} 


R4 - 


{000, 


001, 


010, 


011} 










R5 = 


{000, 


001, 


010, 


oil. 


100, 


101, 


110, 


111} 


Qi = 


{000, 


001, 


010, 


oil. 


100, 


101, 


110, 


111} 


Q2 = 


{000, 


001, 


010, 


oil. 


100, 


101, 


110, 


111} 



. . . (eq 5) 

3 0 The working number of bits in each opcode is 

initially set to be equal to the highest required 
number of opcode bits amongst any of the internal and 
external formats, i.e. 3 opcode bits as required by the 
formats G^, G2 and F5. The initial set Ri of opcodes for 

3 5 external format F^^ is made up of four three-bit codes 

000, 001, 010 and Oil, Four codes are required as b^ 
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was calculated to be 2 in step SI. The same is true 
for the other two-bit external formats R2 and R4 . 

In the case of the external formats F3 and F5 eight 
codes are required and the initial codes assigned to R3 
5 and R5 are 000, 001, 010, Oil, 100, 101, 110 and 111, 

Each of the internal formats Gi and G2 also 
requires eight codes (ai=3 and a2=3) so the initial sets 
Qi and Q2 of opcodes for these internal formats are also 
the same as for the external formats R3 and R5. 

10 In step S3 a first series of iterations is 

commenced, and in this first series the first operation 
in Fig* 6, i.e. the add operation, is selected for 
initial consideration. 

In step S4, the available opcodes for the 

15 operation that are unused (not yet allocated) in each 

relevant pair of external and internal formats (8 pairs 
in all: Fi-G^, F2-G1, F4~Gx, F5-G1, F2-G2, F3-G2, F4-G2, F5-G2 
in this case) are considered. Because no opcodes have 
yet been allocated, for the 5 pairs Fi-Gi, F2-G1, F4-G1, 

20 F2-G2 and F4-G2 ht - {OOO, 001, 010, Oil} while for the 3 

pairs F5-G1, F3-G2 and F5-G2 ht = {OOO, 001, 010, Oil, 
100, 101, 110, 111}. Thus, in step S5 H={000, 001, 
010, Oil}, 

In step 36 it is checked whether H is empty. In 
25 this case it is not, so processing proceeds to step S7. 

Here, the opcode c=0 0 0 is selected first from H. The 
opcode 000 therefore becomes allocated to the add 
operation . 

In step S8 the internal -format opcode sets and 
3 0 Q2 are updated to remove therefrom the opcode 00 0, if 

contained therein. Thus, the code 0 00 is removed from 
each of the sets Qi and Q2. 

Also in step SB the set of available opcodes for 
each relevant external format (in this case all of the 
35 external formats Fi to F5) is updated to remove 

therefrom the opcode 000, if contained therein. Thus, 
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0 00 is removed from each of the sets to R5 . 

The results of the allocations performed in the 
first iteration are shown in Fig. 7(A). In Figs. 7(A) 
to 7 (H) the opcodes remaining in the sets Q or R are 
5 shown. Also, any opcode allocations made in the 

external and internal formats are entered in the 
relevant cells. 

Processing then returns to step S3 for the second 
iteration of this series. In the second iteration, the 

10 or operation is considered. The pairs to be considered 

in step S4 are the same as for the first iteration. 
The results of steps S4 and S5 are that H={001, 010, 
Oil}. Thus, in step S6, H is not empty and processing 
proceeds to step SI. In step SI the opcode c=001 is 

15 selected. Accordingly, in step S8, the opcode 001 is 

removed from each of the sets Q^^ and Q2 of available 
opcodes for the internal formats Gi and G2 . Similarly, 
in the sets Ri to R5 for the external formats to F5, 
the code 0 01 is removed. The results after the second 

20 iteration are shown in Fig. 7(B). 

In the third iteration, the mul operation is 
considered. Again, the pairs to be considered in step 
S4 are the same as for the first and second iterations . 
In this case, the result H of the computation performed 

25 in step S5 is {OlO, Oil}, so that, in step S7, the 

opcode 010 is selected. In step S8 the opcode 010 is 
removed from all the sets Qi to Q2 and Ri to R5 . 

Thus, 010 becomes allocated to the mul operation. 
Fig. 7(C) shows the state reached at this time. 

3 0 In the fourth iteration of the series the li 

instruction is considered. In this case the pairs to 
be examined in step S4 are F2-G1, F4-G1, F2-G2 and F4-G2. 
In step S5 of this iteration it is determined that 
H={01l}. As the H set is not empty, processing goes on 

35 to step S7. Here, the code Oil is selected (it is the 

only code available in the set H) . The code Oil 
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therefore becomes assigned to li. This code is removed 
from the relevant sets Qi, Q2, R2 and R4 , but is left in 
the sets Ri, R3 and R5. The resulting state is shown in 
Fig. 7 (D) . 

5 In the fifth iteration, the sub instruction is 

considered. In step S4 the set of translations T^^b = 
{<1,1>, <3,2>, <5,1>, <5,2>}. Accordingly, as the 
pairs of external and internal formats for these 
translations are F^-G^, F3-G2, F5-G2 the common 

10 sets ht are {} for Fi-Gi and {lOO, 101, 110, 111} for F5- 

Gi, F3-G2 and Fs-Gs^ 

This means H={0} in step S5 . This is because, 
although 100, 101, 101, 110 and 111 are still unused in 
R3/ R5/ Qi and Q2, none of these codes is available in 

15 the remaining relevant set R^ which only contains Oil. 

Accordingly, processing proceeds via step 36 to step 
Sll in which the constraint is assessed. It is 
determined that the intersection between Ri and Qi (and 
between Ri and Q2) is the empty set. Since Ri has less 

2 0 members than Qi and Q2 it can reasonably be concluded 

that Ri is the constraining factor. To overcome this 
constraint the number of effective opcode bits for F^ 
needs to be increased beyond its initial value of 2. 
Accordingly, a^ is increased by one to 3 . The remaining 

2 5 values as to as, b^ and I02 are left unchanged. 

Now, all of the existing opcode assignments are 
void and a second series of iterations is commenced at 
step S2 . In this series of iterations Ri = {OOO, 001. 
010, Oil, 100, 101, 110, 111} initially. In the fifth 

3 0 iteration of this second series the sub instruction is 

again considered. At this stage the sate is shown in 
Fig. 7 (E) . 

This time, in step 35 H={lOO, 101, 110, 111}. In 
step S7 the opcode 100 is selected. In step S8, 100 is 
3 5 removed from R^., R3, R5, Qi and Q2 . The resulting state 

is shown in Fig. 7(F) . 
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In the sixth iteration of the second series, the 
rv instruction is considered for the first time. In 
step S5 H={101, 110, 111}, In step S7 the opcode 101 
is selected. In step S8, 101 is removed from R3 and Q2 . 
5 The resulting state is shown in Fig. 7(G), 

In the seventh iteration of the second series, the 
div instruction is considered for the first time. In 
step 35 H={110, 111}. In step S7 the opcode 110 is 
selected. In step SB, 100 is removed from R3, R5 and 

10 Q2, The resulting state is shown in Fig. 7(H). 

At this point all instructions have been allocated 
opcodes and the processing moves to step SIO. In this 
step the opcodes assigned so far are examined to 
determine how many bits in each external format 

15 actually need to be provided in the instructions in the 

external format concerned. For example, in the 
external format F4 all the allocated codes 000, 001, 010 
and Oil have the prefix 0, This means that the prefix 
0 is entirely redundant is external format F4 . 

20 Accordingly, provided that the format F4 can still be 

distinguished from all other external formats, the 
prefix 0 can be omitted from instructions in format F4 
so that only a 2 -bit opcode field is required for 
format F4 . The same is true for external format F2 . 

25 It follows of course that the mapping functions 

^4,1 f 1^4,2/ ^2,1 m2,2 must insert the 0 prefix during 

translation so that the add, or mul and li operations 
in format F4 are distinguished from the sub, rv and div 
operations in formats Fi, F3 and F5. 

3 0 This optimisation step SIO becomes particularly 

important when the number of prefix bits is greater 
than the number of bits in each instruction set needed 
to give each operation a distinct opcode in each 
external format . 

3 5 The final opcodes after optimisation are shown in 

Fig. 8. 
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A method embodying the present invention can be 
implemented by a general -purpose computer operating in 
accordance with a computer program. This computer 
program may be carried by an suitable carrier medium 
5 such as a storage medium (e.g. floppy disk or CD Rom) 

or a signal. Such a carrier signal could be a signal 
downloaded via a communications network such as the 
Internet . The appended computer program claims are to 
be interpreted as covering a computer program by itself 

10 or in any of the above-mentioned forms. 

Although the above description relates, by way of 
example, to a VLIW processor it will be appreciated 
that the present invention is applicable to processors 
other than VLIW processors. A processor embodying eh 

15 present invention may be included as a processor "core" 

in a highly- integrated "system-on-a-chip" (SOC) for use 
in multimedia applications, network routers, video 
mobile phones, intelligent automobiles, digital 
television, voice recognition, 3D games, etc. 



